{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "# [cknowledge.org/ai](http://cknowledge.org/ai): Crowdsourcing benchmarking and optimisation of AI"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "A suite of open-source tools for [collecting knowledge on optimising AI](http://bit.ly/hipeac49-ckdl):\n",
    "\n",
    "* [Android app](https://play.google.com/store/apps/details?id=openscience.crowdsource.video.experiments&hl=en_GB)\n",
    "* [Desktop app](https://github.com/dividiti/ck-crowdsource-dnn-optimization)\n",
    "* [CK-Caffe](https://github.com/dividiti/ck-caffe)\n",
    "* [CK-Caffe2](https://github.com/ctuning/ck-caffe2)\n",
    "* [CK-TensorFlow](https://github.com/ctuning/ck-tensorflow)\n",
    "* [CK-TensorRT](https://github.com/ctuning/ck-tensorrt)\n",
    "* etc."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "# [PUBLIC] Benchmarking Caffe with OpenBLAS"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "platform_id = ''"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "## Table of Contents"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "1. [See the code](#code) [for developers]\n",
    "1. [Get the data](#data) [for developers]\n",
    "1. [See the tables](#tables)\n",
    "  1. [All data](#df_all)\n",
    "  1. [All execution time data](#df_time)\n",
    "  1. [Min execution time per batch](#df_min_time_per_batch)\n",
    "  1. [Min execution time per image](#df_min_time_per_image)\n",
    "  1. [Max images per second](#df_max_images_per_second)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "<a id=\"code\"></a>\n",
    "## Data wrangling code"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "**NB:** Please ignore this section if you are not interested in re-running or modifying this notebook."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "### Includes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "#### Standard"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "import os\n",
    "import sys\n",
    "import json\n",
    "import re"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "#### Scientific"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "If some of the scientific packages are missing, please install them using:\n",
    "```\n",
    "# pip install jupyter pandas numpy matplotlib\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "import IPython as ip\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib as mp"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "print ('IPython version: %s' % ip.__version__)\n",
    "print ('Pandas version: %s' % pd.__version__)\n",
    "print ('NumPy version: %s' % np.__version__)\n",
    "print ('Matplotlib version: %s' % mp.__version__)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "from IPython.display import Image\n",
    "from IPython.core.display import HTML\n",
    "\n",
    "from IPython.display import display\n",
    "def display_in_full(df):\n",
    "    pd.options.display.max_columns = len(df.columns)\n",
    "    pd.options.display.max_rows = len(df.index)\n",
    "    display(df)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt; plt.style.use('classic')\n",
    "from mpl_toolkits.mplot3d import Axes3D\n",
    "from matplotlib import cm\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "default_title = 'Caffe w/ OpenBLAS'\n",
    "default_ylabel = 'Execution time (ms)'\n",
    "default_colormap = cm.autumn\n",
    "default_fontsize = 16\n",
    "default_figsize = [16, 8]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "#### Collective Knowledge"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "If CK is not installed, please install it using:\n",
    "```\n",
    "# pip install ck\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "import ck.kernel as ck\n",
    "print ('CK version: %s' % ck.__version__)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "### Access the experimental data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "def get_experimental_results(repo_uoa, tags, time_ms='time_fw_ms'):\n",
    "    module_uoa = 'experiment'\n",
    "    r = ck.access({'action':'search', 'repo_uoa':repo_uoa, 'module_uoa':module_uoa, 'tags':tags})\n",
    "    if r['return']>0:\n",
    "        print (\"Error: %s\" % r['error'])\n",
    "        exit(1)\n",
    "    experiments = r['lst']\n",
    "    \n",
    "    dfs = []\n",
    "    for experiment in experiments:\n",
    "        data_uoa = experiment['data_uoa']\n",
    "        r = ck.access({'action':'list_points', 'repo_uoa':repo_uoa, 'module_uoa':module_uoa, 'data_uoa':data_uoa})\n",
    "        if r['return']>0:\n",
    "            print (\"Error: %s\" % r['error'])\n",
    "            exit(1)\n",
    "\n",
    "        # Get (lib_tag, model_tag) from a list of tags that should be available in r['dict']['tags'].\n",
    "        # Tags include 2 of the 3 irrelevant tags, a model tag and a lib tag.\n",
    "        # NB: Since it's easier to list all model tags than all lib tags, the latter list is not expicitly specified.\n",
    "        tags = r['dict']['tags']\n",
    "        print(tags)\n",
    "        irrelevant_tags = [ 'explore-batch-size-openblas-threads', 'caffe-time', platform_id ]\n",
    "        model_tags = [ 'bvlc-alexnet', 'bvlc-googlenet', 'deepscale-squeezenet-1.1', 'deepscale-squeezenet-1.0' ]\n",
    "        lib_model_tags = [ tag for tag in tags if tag not in irrelevant_tags ]\n",
    "        model_tags = [ tag for tag in lib_model_tags if tag in model_tags ]\n",
    "        lib_tags = [ tag for tag in lib_model_tags if tag not in model_tags ]\n",
    "        if len(lib_tags)==1 and len(model_tags)==1:\n",
    "            (lib, model) = (lib_tags[0], model_tags[0])\n",
    "        else:\n",
    "            continue\n",
    "        \n",
    "        for point in r['points']:\n",
    "            with open(os.path.join(r['path'], 'ckp-%s.0001.json' % point)) as point_file:\n",
    "                point_data_raw = json.load(point_file)\n",
    "            characteristics_list = point_data_raw['characteristics_list']\n",
    "            num_repetitions = len(characteristics_list)\n",
    "            platform = point_data_raw['features']['platform']['platform']['model']\n",
    "            batch_size = np.int64(point_data_raw['choices']['env'].get('CK_CAFFE_BATCH_SIZE', -1))\n",
    "            num_threads = np.int64(point_data_raw['choices']['env'].get('OPENBLAS_NUM_THREADS', -1))\n",
    "            # Obtain column data.\n",
    "            data = [\n",
    "                {\n",
    "                    # features\n",
    "                    'platform' : platform,\n",
    "                    # choices\n",
    "                    'lib' : lib,\n",
    "                    'model' : model,\n",
    "                    'batch_size' : batch_size,\n",
    "                    'num_threads' : num_threads,\n",
    "                    # statistical repetition\n",
    "                    'repetition_id': repetition_id,\n",
    "                    # runtime characteristics\n",
    "                    'time (ms)'   : characteristics['run'].get(time_ms, +1e9), # \"positive infinity\"\n",
    "                    'per layer info' : characteristics['run'].get('per_layer_info', []),\n",
    "                    'success?'    : characteristics['run'].get('run_success', 'n/a')\n",
    "                }\n",
    "                for (repetition_id, characteristics) in zip(range(num_repetitions), characteristics_list) \n",
    "            ]\n",
    "            # Deal with missing column data (resulting from failed runs).\n",
    "            if len(data)==1:\n",
    "                print ('[Warning] Missing data for: lib=%s, model=%s, batch_size=%d' % (lib, model, batch_size))\n",
    "                data = data * num_repetitions\n",
    "            # Construct a DataFrame.\n",
    "            df = pd.DataFrame(data)\n",
    "            # Set columns and index names.\n",
    "            df.columns.name = 'run characteristic'\n",
    "            df.index.name = 'index'\n",
    "            df = df.set_index([ 'platform', 'lib', 'model', 'num_threads', 'batch_size', 'repetition_id' ])\n",
    "            # Append to the list of similarly constructed DataFrames.\n",
    "            dfs.append(df)\n",
    "    # Concatenate all constructed DataFrames (i.e. stack on top of each other).\n",
    "    result = pd.concat(dfs)\n",
    "    return result.sort_index()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "### Plot images per second against the batch size and the number of threads"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "def plot_trisurf(df_model, x_col, y_col, z_col, x_label=None, y_label=None, z_label=None, title=None):\n",
    "    x = df_model[x_col]\n",
    "    y = df_model[y_col]\n",
    "    z = df_model[z_col]\n",
    "    \n",
    "    if x_label == None: x_label = x_col\n",
    "    if y_label == None: y_label = y_col\n",
    "    if z_label == None: z_label = z_col\n",
    "        \n",
    "    x_ticks = x.unique()\n",
    "    y_ticks = y.unique()\n",
    "    \n",
    "    fig = plt.figure(figsize=(24, 12), dpi=600)\n",
    "    ax = fig.add_subplot(111, projection='3d')\n",
    "    trisurf = ax.plot_trisurf(x, y, z, cmap=cm.autumn_r, linewidth=0.2, antialiased=True, shade=True)\n",
    "    ax.set_xlabel(x_label); ax.set_xticks(x_ticks); ax.set_xlim3d(x_ticks.max(), x_ticks.min())\n",
    "    ax.set_ylabel(y_label); ax.set_yticks(y_ticks); ax.set_ylim3d(y_ticks.min(), y_ticks.max())\n",
    "    ax.set_zlabel(z_label); ax.set_zlim3d(z.min(), z.max())\n",
    "    ax.set_title(title, fontsize=20)\n",
    "    fig.colorbar(trisurf, shrink=0.5, aspect=10)\n",
    "    return fig"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "<a id=\"data\"></a>\n",
    "## Get the experimental data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "**NB:** Please ignore this section if you are not interested in re-running or modifying this notebook. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "The Caffe experimental data was collected on the experimental platform (after installing all Caffe libraries and models of interest) as follows:\n",
    "```\n",
    "$ cd `ck find ck-caffe:script:explore-batch-size-openblas-threads`\n",
    "$ python explore-batch-size-openblas-threads-benchmarking.py\n",
    "```\n",
    "It can be downloaded and registered with CK e.g. as follows:\n",
    "```\n",
    "$ ck add repo:ck-caffe-openblas-threads --url=https://github.com/dividiti/ck-caffe-samsung-chromebook2.git\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "<a id=\"tables\"></a>\n",
    "## Tables"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "<a id=\"df_all\"></a>\n",
    "### All data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "df_all = get_experimental_results(\n",
    "    repo_uoa='ck-caffe-openblas-threads ',\n",
    "    tags='explore-batch-size-openblas-threads') \\\n",
    "    .reset_index(['platform', 'lib'], drop=True)\n",
    "display_in_full(df_all)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "<a id=\"df_time\"></a>\n",
    "### All execution time data indexed by repetitions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "df_time = df_all['time (ms)'].unstack(df_all.index.names[:-1])\n",
    "display_in_full(df_time)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "<a id=\"df_min_time_per_batch\"></a>\n",
    "### Min execution time per batch"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "df_min_time_per_batch = df_time.describe().loc['min'].unstack(level='batch_size')\n",
    "display_in_full(df_min_time_per_batch)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "<a id=\"df_min_time_per_image\"></a>\n",
    "### Min execution time per image"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "batch_sizes = df_min_time_per_batch.columns.tolist()\n",
    "df_min_time_per_image = df_min_time_per_batch / batch_sizes\n",
    "display_in_full(df_min_time_per_image)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "<a id=\"df_max_images_per_second\"></a>\n",
    "### Max images per second"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "df_min_seconds_per_image = 1e-3 * df_min_time_per_image\n",
    "df_max_images_per_second = 1 / df_min_seconds_per_image\n",
    "display_in_full(df_max_images_per_second)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# A workaround to make \"Delaunay triangulation calculation\" to work with a single batch size:\n",
    "# insert a duplicate column using the negative value of the batch size.\n",
    "if df_max_images_per_second.columns.size == 1:\n",
    "    batch_size = df_max_images_per_second.columns[0]\n",
    "    df_max_images_per_second[-batch_size] = df_max_images_per_second.values\n",
    "    display_in_full(df_max_images_per_second)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "models = df_max_images_per_second.index.get_level_values('model').unique()\n",
    "for model in models:\n",
    "    df_model = df_max_images_per_second \\\n",
    "        .loc[model] \\\n",
    "        .unstack() \\\n",
    "        .reset_index() \\\n",
    "        .rename(columns={0 : 'time (ms)'}) \\\n",
    "        .dropna() \\\n",
    "        .sort_values(by='batch_size', ascending=False)\n",
    "    fig = plot_trisurf(df_model, title=model,\n",
    "                 x_col='num_threads', x_label='Number of threads',\n",
    "                 y_col='batch_size', y_label='Batch size',\n",
    "                 z_col='time (ms)', z_label='Images per second')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
